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A  stochastic  process'  statistical  complexity  stands  out  as  a  fundamental  property:  the  minimum 
information  required  to  synchronize  one  process  generator  to  another.  How  much  information  is 
required,  though,  when  synchronizing  over  a  quantum  channel?  Recent  work  demonstrated  that 
representing  causal  similarity  as  quantum  state-indistinguishability  provides  a  quantum  advantage.  We 
generalize  this  to  synchronization  and  offer  a  sequence  of  constructions  that  exploit  extended  causal 
structures,  finding  substantial  increase  of  the  quantum  advantage.  We  demonstrate  that  maximum 
compression  is  determined  by  the  process'  cryptic  order-a  classical,  topological  property  closely  allied 
to  Markov  order,  itself  a  measure  of  historical  dependence.  We  introduce  an  efficient  algorithm  that 
computes  the  quantum  advantage  and  close  noting  that  the  advantage  comes  at  a  cost-one  trades  off 
prediction  for  generation  complexity. 


Discovering  and  describing  correlation  and  pattern  are  critical  to  progress  in  the  physical  sciences.  Observing  the 
weather  in  California  last  Summer  we  find  a  long  series  of  sunny  days  interrupted  only  rarely  by  rain-a  pattern 
now  all  too  familiar  to  residents.  Analogously,  a  one- dimensional  spin  system  in  a  magnetic  field  might  have  most 
of  its  spins  “up”  with  just  a  few  “down”-defects  determined  by  the  details  of  spin  coupling  and  thermal  fluctu¬ 
ations.  Though  nominally  the  same  pattern,  the  domains  of  these  systems  span  the  macroscopic  to  the  micro¬ 
scopic,  the  multi-layer  to  the  pure.  Despite  the  gap,  can  we  meaningfully  compare  these  two  patterns? 

To  exist  on  an  equal  descriptive  footing,  they  must  each  be  abstracted  from  their  physical  embodiment  by,  for 
example,  expressing  their  generating  mechanisms  via  minimal  probabilistic  encodings.  Measures  of  unpredicta¬ 
bility,  memory,  and  structure  then  naturally  arise  as  information-theoretic  properties  of  these  encodings.  Indeed, 
the  fundamental  interpretation  of  (Shannon)  information  is  as  a  rate  of  encoding  such  sequences.  This  recasts 
the  informational  properties  as  answers  to  distinct  communication  problems.  For  instance,  a  process’  structure 
becomes  the  problem  of  two  observers,  Alice  and  Bob,  synchronizing  their  predictions  of  the  process. 

However,  what  if  the  communication  between  Alice  and  Bob  is  not  classical?  What  if  Alice  instead  sends 
qubits  to  Bob-that  is,  they  synchronize  over  a  quantum  channel?  Does  this  change  the  communication  require¬ 
ments?  More  generally,  does  quantum  communication  enhance  our  understanding  of  what  “pattern”  is  in  the  first 
place?  What  if  the  original  process  is  itself  quantum?  More  practically,  is  the  quantum  encoding  more  compact? 

A  provocative  answer  to  the  last  question  appeared  recently1-3  suggesting  that  a  quantum  representation  can 
compress  a  stochastic  process  beyond  its  known  classical  limits4.  In  the  following,  we  introduce  a  new  construc¬ 
tion  for  quantum  channels  that  improves  and  broadens  that  result  to  any  memoryful  stochastic  process,  is  highly 
computationally  efficient,  and  points  toward  optimal  quantum  compression.  Importantly,  we  draw  out  the  con¬ 
nection  between  quantum  compressibility  and  process  cryptic  order-a  purely  classical  property  that  was  only 
recently  discovered5.  Finally,  we  discuss  the  subtle  way  in  which  the  quantum  framing  of  pattern  and  structure 
differs  from  the  classical. 

Synchronizing  Classical  Processes 

To  frame  these  questions  precisely,  we  focus  on  patterns  generated  by  discrete-valued,  discrete-time  stationary 
stochastic  processes.  There  is  a  broad  literature  that  addresses  such  emergent  patterns6-8.  In  particular, 
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computational  mechanics  is  a  well- developed  theory  of  pattern  whose  primary  construct-the  e-machine- is  a 
process’  minimal,  unifilar  predictor4.  The  e-machine’s  causal  states  a  £  S  are  defined  by  the  equivalence  relation 
that  groups  all  histories  x  =  x_oo.0  that  lead  to  the  same  prediction  of  the  future  X  =  Xq.^: 

5c  ~  x!  Pr(X|5c)  =  Pr(X|x').  (1) 

A  process’  e-machine  allows  one  to  directly  calculate  its  measures  of  unpredictability,  memory,  and  structure. 

For  example,  the  most  basic  question  about  unpredictability  is,  how  much  uncertainty  about  the  next  future 
observation  remains  given  complete  knowledge  of  the  infinite  past?  This  is  measured  by  the  well-known  Shannon 
entropy  rate  h^~u: 


h^=  limK(XL\X0:L),  (2) 

where  XL  denotes  the  symbol  random  variable  (r.v.)  at  time  L,  X0:L  denotes  the  length-L  block  of  symbol  r.v.s  X0, 
. . .,  XL_l5  and  H  =  —  ^2p.  log  p.  is  the  Shannon  entropy  (in  bits  using  log  base  2)  of  the  probability  distribution 
{ pi\ 13 .  A  process’  e-machine  allows  us  to  directly  calculate  this  in  closed  form  as  the  state- averaged  branching 
uncertainty: 


^  =  £^H(*ol<s0  =  ^),  (3) 

where  7T,-  denotes  the  stationary  distribution  over  the  causal  states.  This  form  is  possible  due  to  the  e-machine’s 
unifilarity:  in  each  state  cr,  each  symbol  x  leads  to  at  most  one  successor  state  a' . 

One  can  ask  the  complementary  question,  given  knowledge  of  the  infinite  past,  how  much  can  we  reduce  our 
uncertainty  about  the  future?  This  quantity  is  the  mutual  information  between  the  past  and  future  and  is  known 
the  excess  entropy 9: 


E  =  Ipf_oo:0:X0:J.  (4) 

It  is  the  total  amount  of  future  information  predictable  from  the  past.  Using  the  e-machine  we  can  directly  calcu¬ 
late  it  also: 


E  =  I[5+:  <S“],  (5) 

where  <S+  and  5  are  the  forward  (predictive)  and  reverse  (retrodictive)  causal  states,  respectively5.  This  suggests 
we  think  of  any  process  as  channel  that  communicates  the  past  to  the  future  through  the  present.  In  this  view  E  is 
the  information  transmission  rate  through  the  present  “channel”.  The  excess  entropy  has  been  applied  to  capture 
the  total  predictable  information  in  such  diverse  systems  as  Ising  spin  models14,  diffusion  in  nonlinear  poten¬ 
tials15,  neural  spike  trains16-18,  and  human  language19. 

What  memory  is  necessary  to  implement  predicting  E  bits  of  the  future  given  the  past?  Said  differently,  what 
resources  are  required  to  instantiate  this  putative  channel?  Most  basically,  this  is  simply  the  historical  information 
the  process  remembers  and  stores  in  the  present.  The  minimum  necessary  such  information  is  that  stored  in  the 
causal  states,  the  statistical  complexity 4: 


=  H  (5)  =  -  £  7Tj  log  7 r 

i  W 

Importantly,  it  is  lower-bounded  by  the  excess  entropy: 

E  <  (7) 

What  do  these  quantities  tell  us?  Perhaps  the  most  surprising  observation  is  that  there  is  a  large  class  of  cryptic 
processes  for  which  E^C  C^5.  The  structural  mechanism  behind  this  difference  is  characterized  by  the  cryptic  order : 
the  minimum  k  for  which  H  [Sk  |X0:oo]  =  0.  A  related  and  more  familiar  property  is  the  Markov  order :  the  small¬ 
est  R  for  which  H  IXq.^]  =  0.  Markov  order  reflects  a  process’  historical  dependence.  These  orders  are  inde¬ 
pendent  apart  from  the  fact  that  k  <  ff20,21.  It  is  worth  pointing  out  that  the  equality  E  =  is  obtained  exactly  for 
cryptic  order  k  =  0  and,  furthermore,  that  this  corresponds  with  counifilarity- for  each  state  a'  and  each  symbol 
x,  there  is  at  most  one  prior  state  a  that  leads  to  a'  on  a  transition  generating  x21. 

These  properties  play  a  key  role  in  the  following  communication  scenario  where  we  have  a  given  process’ 
e-machine  in  hand.  Alice  and  Bob  each  have  a  copy  Since  she  has  been  following  the  process  for  some  time,  using 
her  e-machine  Alice  knows  that  the  process  is  currently  in  state  ay  say  From  this  knowledge,  she  can  use  her 
e-machine  to  make  the  optimal  probabilistic  prediction  Pr(X0.L|oy)  about  the  process’  future  (and  do  so  over 
arbitrarily  long  horizons  L).  While  Bob  is  able  to  produce  all  such  predictions  from  each  of  his  e-machine’s  states, 
he  does  not  know  which  particular  state  is  currently  relevant  to  Alice.  We  say  that  Bob  and  Alice  are 
unsynchronized. 

To  communicate  the  relevant  state  to  Bob,  Alice  must  send  at  least  bits  of  information.  More  precisely,  to 
communicate  this  information  for  an  ensemble  (size  N  — >  «>)  of  e-machines,  she  may,  by  the  Shannon  noiseless 
coding  theorem13,  send  NC ^  bits.  Under  this  interpretation,  is  a  fundamental  measure  of  a  process’  structure 
in  that  it  characterizes  not  only  the  correlation  between  past  and  future,  but  also  the  mechanism  of  prediction.  In 
the  scenario  with  Alice  and  Bob,  is  seen  as  the  communication  cost  to  synchronize.  We  can  also  imagine  Alice 
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using  this  channel  to  communicate  with  her  future  self.  In  this  light,  is  understood  as  a  fundamental  measure 
of  a  process’  internal  memory. 

Results 

Quantum  Synchronization.  What  if  Alice  can  send  qubits  to  Bob?  Consider  a  communication  protocol 
in  which  Alice  encodes  the  causal  state  in  a  quantum  state  that  is  sent  to  Bob.  Bob  then  extracts  the  information 
through  measurement  of  this  quantum  state.  Their  communication  is  implemented  via  a  quantum  object-the 
q-machine  -that  simulates  the  original  stochastic  process.  It  sports  a  single  parameter  that  sets  the  horizon-length 
L  of  future  words  incorporated  in  the  quantum- state  superpositions  it  employs.  We  monitor  the  q-machine  proto¬ 
col’s  efficacy  by  comparing  the  quantum-state  information  transmission  rate  to  the  classical  causal-state  rate  (C^). 

The  q-machine  M(L)  consists  of  a  set  { \rjk(L) ) }  of  pure  signal  states  that  are  in  one-to-one  correspondence  with 
the  classical  causal  states  ak  E  S.  Each  signal  state  \rjk(L) )  encodes  the  set  of  length-L  words  that  may  follow  crk , 
as  well  as  each  corresponding  conditional  probability  used  for  prediction  from  ak.  Fixing  L,  we  construct  quan¬ 
tum  states  of  the  form: 


k(0)  =  E  E  JPr(wi>  I wL)\ffk), 

'  ’  1  V  '  '  (8) 

where  wL  denotes  a  length-L  word  and  Pr(wL,  crk\aj)  =  Pr(X0.L  =  wL,  SL  =  ak\S0  =  a  ).  Due  to  e-machine 
unifilarity,  a  word  wL  following  a  causal  state  cr-  leads  to  only  one  subsequent  causal  state.  Thus, 
Pr  (wL,  ak\ <jj)  =  Pr(wL|crj).  The  resulting  Hilbert  space  is  the  product  TLW  ®  7i(J.  Factor  space  7ia  is  of  size \S\, 
the  number  of  classical  causal  states,  with  basis  elements  \  ak).  Factor  space  7iw  is  of  size  \A\L,  the  number  of 
length-L  words,  with  basis  elements  | wL)  =  |x0)  \xl-i)- 

Note  that  the  L  =  1  q-machine  M(  1)  is  equivalent  to  the  construction  introduced  in  ref.  1.  Additionally,  insight 
about  the  q-machine  can  be  gained  through  its  connection  with  the  classical  concatenation  machine  defined  in 
ref.  22;  the  q-machine  M(L)  is  equivalent  to  the  q-machine  M(  1)  derived  from  the  Lth  concatenation  machine. 

Having  specified  the  state  space,  we  now  describe  how  the  q-machine  produces  symbol  sequences.  Given  one 
of  the  pure  quantum  signal  states,  we  perform  a  projective  measurement  in  the  7 ~iw  basis.  This  results  in  a  symbol 
string  wL  =  x0,  ...,  xL_v  which  we  take  as  the  next  L  symbols  in  the  generated  process.  Since  the 
e-machine  is  unifilar,  the  quantum  conditional  state  must  be  in  some  basis  state \ak)  of  Tia.  Subsequent  measure¬ 
ment  in  this  basis  then  indicates  the  corresponding  classical  causal  state  with  no  uncertainty 

Observe  that  the  probability  of  a  word  w L  given  quantum  state  \rjk)  is  equal  to  the  probability  of  that  word  given 
the  analogous  classical  state  ak.  Also,  the  classical  knowledge  of  the  subsequent  corresponding  causal  state  can  be 
used  to  prepare  a  subsequent  quantum  state  for  continued  symbol  generation.  Thus,  the  q-machine  generates  the 
desired  stochastic  process  and  is,  in  this  sense,  equivalent  to  the  classical  e-machine. 

Focus  now  on  the  q-machine’s  initial  quantum  state: 

p(L)=EPi\vl(L))(vi(L)\-  (9) 

We  see  this  mixed  quantum  state  is  composed  of  pure  signal  states  combined  according  to  the  probabilities  of 
each  being  prepared  by  Alice  (or  being  realized  by  the  original  process  that  she  observes).  These  are  simply  the 
probabilities  of  each  corresponding  classical  causal  state,  which  we  take  to  be  the  stationary  distribution:  p{  =  7Tf. 
In  short,  quantum  state  p(L)  is  what  Alice  must  transmit  to  Bob  for  him  to  successfully  synchronize.  Later,  we 
revisit  this  scenario  to  discuss  the  tradeoffs  associated  with  the  q-machine  representation. 

If  Alice  sends  a  large  number  N  of  these  states,  she  may,  according  to  the  quantum  noiseless  coding  theorem23, 
compress  this  message  into  NS(p(L))  qubits,  where  S  is  the  von  Neumann  entropy  S(p)  =  tr(plog(p)).  Due  to  its 
parallel  with  C^,  and  for  convenience,  we  define  the  function: 

Cq(L)  =  S(p(L)).  (10) 

Recall  that,  classically,  Alice  must  send  NC ^  bits.  To  the  extent  that  NCq(L)  is  smaller,  the  quantum  protocol 
will  be  more  efficient.  In  this  particular  sense,  the  q-machine  is  a  compressed  representation  of  the  original  pro¬ 
cess  and  its  £-machine. 

Example  Processes:  Cq(L).  Let’s  now  draw  out  specific  consequences  of  using  the  q-machine.  We  explore 
protocol  efficiency  by  calculating  Cq(L)  for  several  example  processes,  each  chosen  to  illustrate  distinct  proper¬ 
ties:  the  q-machine  affords  a  quantum  advantage,  further  compression  can  be  found  at  longer  horizons  L,  and  the 
compression  rate  is  minimized  at  the  horizon  length  A:— the  cryptic  order  of  the  classical  process21. 

For  each  example,  we  examine  a  process  family  by  sweeping  one  transition  probability  parameter,  illustrating 
Cq(L)  and  its  relation  to  classical  bounds  and  E.  Additionally,  we  highlight  a  single  representative  process 
at  one  generic  transition  probability  Following  these  examples,  we  turn  to  discuss  more  general  properties  of 
q-machine  compression  that  apply  quite  broadly  and  how  the  results  alter  our  notion  of  quantum  structural 
complexity 

Biased  Coins  Process.  The  Biased  Coins  Process  provides  a  first,  simple  case  that  realizes  a  nontrivial  quantum 
state  entropy1.  There  are  two  biased  coins,  named  A  and  B.  The  first  generates  0  with  probability  p;  the  second, 
0  with  probability  1  —  p.  At  each  step,  one  coin  is  flipped-which  coin  is  flipped  depends  on  the  result  of  the 
previous  flip.  If  the  previous  flip  yielded  a  1,  the  next  flip  is  made  using  coin  B.  If  the  previous  flip  yielded  a  1,  the 
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Figure  1.  Biased  Coins  Process:  (top)  e-Machine.  Edges  are  conditional  probabilities.  For  example,  self-loop 
labelp|0  from  state  A  indicates  Pr(0| A)=p.  (left)  Statistical  complexity  C/y,  quantum  state  entropy  Cq(L ),  and 
excess  entropy  E  as  a  function  of  As  self-loop  probability  p  G  [0,  1].  Cq{  1)  (dark  blue)  lies  between  Cq  and  E 
(bits),  except  for  extreme  parameters  and  the  center  ( p  =  1/2).  (right)  For  p  —  0.666,  Cq(L)  decreases  from  L  —  0 
to  L  =  1  and  is  then  constant;  the  process  is  maximally  compressed  at  L  =  1,  its  cryptic  order  k=  1.  This  yields 
substantial  compression:  Cq(  1)  C^. 


next  flip  is  made  using  coin  A.  Otherwise  the  same  coin  is  flipped.  Its  two  causal-state  e-machine  is  shown  in 
Fig.  l(top). 

Consider  p  «  1/2.  The  generated  sequence  is  close  to  that  of  a  fair  coin.  And,  starting  with  coin  A  or  B  makes 
little  difference  to  the  future;  there  is  little  to  predict  about  future  sequences.  This  intuition  is  quantified  by  the 
predictable  information  E  «  0,  whenp  is  near  1/2.  See  Fig.  1  (left). 

In  contrast,  since  the  causal  states  have  equal  probability,  C^  =  1  bit  independent  of  parameter  p.  (All  informa¬ 
tion  measures  are  quoted  in  log  base  2.)  This  is  because  there  is  always  some ,  albeit  very  little,  predictive  advantage 
to  remembering  whether  the  last  symbol  was  0  or  1.  Retaining  this  advantage,  however  small,  requires  the  use 
of  an  entire  (classical)  bit.  The  gap  between  and  E  presents  an  opportunity  for  large  quantum  improvement. 
It  is  only  at  the  exact  value  p  =  1/2  where  the  two  causal  states  merge,  this  advantage  disappears,  and  the  process 
becomes  memoryless  or  independent,  identically  distributed  (IID).  This  is  reflected  in  the  discontinuity  of  as 
p  — >  1/2,  which  is  sometimes  misinterpreted  as  a  deficiency  of  C^.  Contrariwise,  this  feature  follows  naturally  from 
the  equivalence  relation  Eq.  (1)  and  is  a  signature  of  symmetry 

Now,  let’s  consider  these  complexities  in  the  quantum  setting  where  we  monitor  communication 
costs  using  Cq(L).  To  understand  its  behavior,  we  first  write  down  the  q-machine’s  states. 
For  L—  0,  we  have  the  trivial  | r^)  =  |A)and|p5°)  =  \B).  For  L=  1,  we  have  \rj^)  =  ^1  —  p  |0)  |A)  +  Jp\l)\ B) 
and  [  rj* )  =  ^Jp  |0)  |A)  +  ^jl  —  p  \  1 )  |  B).  The  von  Neumann  entropy  of  the  former  is  simply  the  Shannon  informa¬ 
tion  of  the  signal  state  distribution;  that  is,  Cq( 0)  =  Cq.  In  the  latter,  however,  the  two  quantum  states  have  a 
nonzero  overlap  (inner  product).  This  implies  that  the  von  Neumann  entropy  is  smaller  than  the  Shannon  entropy 
C^(l)  <  Cq(  0).  (See  ref.  24  Thm.  11.10.)  Also,  making  use  of  the  Holevo  bound,  we  see  that  E<  Cq(  l)1,25.  These 
bounds  are  maintained  for  all  L:  E  <  Cq(L)  <  Cq.  This  follows  by  considering  the  q-machine  M(  1)  of  the  Lth  clas¬ 
sical  concatenation,  described  above. 

(Note  that  forp  G  {0, 1/2, 1}  these  quantities  are  all  equal  and  equal  to  zero.  This  comes  from  the  simplification 
of  process  topology  caused  by  state  merging  dictated  by  the  predictive  equivalence  relation,  Eq.  (1).) 

How  do  costs  change  with  sequence  length  L?  To  see  this  Fig.  1  (right)  expands  the  left  view  for  a  single  value 
of p.  As  expected,  Cq(L)  decreases  from  L  =  0  to  L  =  1.  However,  it  then  remains  constant  for  all  L  >  1.  There  is  no 
additional  quantum  state-compression  afforded  by  expanding  the  q-machine  to  use  longer  horizons. 

The  Biased  Coins  Process  has  been  analyzed  earlier  using  a  construction  equivalent  to  an  L  =  1  q-machine1, 
similarly  finding  that  the  number  of  required  qubits  falls  between  E  and  C^.  The  explanation  there  for  this  com¬ 
pression  (Cq(l)  <  C ^)  was  lack  of  counifilarity  in  the  process’  e-machine.  More  specifically,  ref.  1  showed  that 
E  =  Cq  =  if  and  only  if  the  e-machine  is  counifilar,  and  e<c9<  otherwise.  The  Biased  Coins  Process  is 
easily  seen  to  be  noncounifilar  and  so  the  inequality  follows. 
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Figure  2.  4-3  Golden  Mean  Process:  (top)  The  e-machine,  (left)  Statistical  complexity  C^,  quantum  state 
entropy  Cq(L ),  and  excess  entropy  E  as  a  function  of  As  self-loop  probability  p  G  [0, 1].  Cq(L)  is  calculated  and 
plotted  (light  to  dark  blue)  up  to  L  —  5.  (right)  For  p=  0.666,  Cq(L)  decreases  monotonically  until  L  =  3-the 
process’  cryptic  order.  The  additional  compression  is  substantial:  Cq( 3)  Cq(  1). 


This  previous  analysis  happens  to  be  sufficient  for  the  Biased  Coins  Process,  since  Cq(L)  does  not  decrease 
beyond  L  =  1.  Unfortunately,  only  this  single,  two-state  process  was  analyzed  previously  when,  in  fact,  the  space  of 
processes  is  replete  with  richly  structured  behaviors26.  With  this  in  mind,  and  to  show  the  power  of  the  q-machine, 
we  step  into  deeper  water  and  consider  a  7-state  process  that  is  almost  periodic  with  a  random  phase-slip. 

R-k  Golden  Mean  Process.  The  R-k  Golden  Mean  Process  is  a  useful  generalization  of  the  Markov  order- 1 
Golden  Mean  Process  that  allows  for  the  independent  specification  of  Markov  order  R  and  cryptic  order  k20,21. 
Figure  2(top)  illustrates  its  e-machine.  We  take  R  =  4  and  k=  3. 

The  calculations  in  Fig.  2(left)  show  again  that  Cq(L)  generically  lies  between  E  and  C^,  across  this  family  of 
processes.  In  contrast  with  the  previous  example,  Cq(L)  continues  to  decrease  beyond  L  =  1.  Figure  2(right)  illus¬ 
trates  that  the  successive  q-machines  continue  to  reduce  the  von  Neumann  entropy:  >  Cq(  1)  >  Cq(2)  >  Cq( 3). 
However,  there  is  no  further  improvement  beyond  a  future-depth  of  L  =  3,  the  cryptic  order:  Cq(3)  =  Cq(L  >  3). 
It  is  important  to  note  that  the  compression  improvements  at  stages  L  —  2  and  L=  3  are  significant.  Therefore,  a 
length- 1  quantum  representation  misses  the  majority  of  the  quantum  advantage. 

To  understand  these  results  we  need  to  sort  out  how  quantum  compression  stems  from  noncounifilarity.  In 
short,  the  latter  leads  to  quantum  signal  states  with  nonzero  overlap  that  allow  for  super- classical  compression. 
Let’s  explain  using  the  current  example.  There  is  one  noncounifilar  state  in  this  process,  state  A.  Both  states  A  and 
G  lead  to  A  on  a  symbol  1.  Due  to  this,  at  L  =  1,  the  two  q-machine  states: 

\vA)  =  +  V1  -pI°)Ib>  and  (ii) 

hG>  =  l1)lA)  (12) 

have  a  nonzero  overlap  of  {vA\vG)  =  (AH  other  overlaps  in  the  L  =  1  q-machine  vanish.)  As  with  the  Biased 
Coins  Process,  this  leads  to  the  inequality  Cq(  1)  <  C^. 

Extending  the  representation  to  L  =  2  words,  we  find  three  nonorthogonal  quantum  states: 

K)  =  sl?\n)\A)  +  Vp(i  —  /»)  |10>|B)  +  7(1  —  P)|00>|C),  (13) 

\vF)  =  |H)|A>,  and  (14) 
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\vG)  =  Jp\n)\A)  +  VI^7|10>|B),  (15) 

with  three  nonzero  overlaps  (r]A\rjF)  =  p,(riA |r?G)  =  Jp,  and(riF\riG)  =  Jp. 

Note  that  the  overlap  (rjA\r]G)  is  unchanged.  This  is  because  the  conditional  futures  are  identical  once  the 
merger  on  symbol  1  has  taken  place.  That  is,  the  words  1 1  and  10,  which  contribute  to  the  L  =  2  overlap  (77  | rjG), 
simply  derive  from  the  prefix  1,  which  was  the  source  of  the  overlap  at  L  =  1.  In  order  to  obtain  a  change  in  this  or 
any  other  overlap,  there  must  be  a  new  merger- inducing  prefix  (for  that  state-pair).  (See  Sec.  5  for  computational 
implications.)  Since  all  quantum  amplitudes  are  positive,  each  pairwise  overlap  is  a  nondecreasing  function  of  L. 

At  L  =  2  we  have  two  such  new  mergers:  1 1  for  (rjA\ffF)  and  1 1  for  (rjF\r]G).  This  additional  increase  in  pairwise 
overlaps  leads  to  a  second  decrease  in  the  von  Neumann  entropy  (See  Sec.  3  for  details.)  Then,  at  L  =  3,  we  find 
three  new  mergers:  1 1 1  for  (t?a  1 77^),  1 1 1  for  (77^  1 77  ),  and  1 1 1  for  (t?£  |  r]G).  As  before,  the  pre-existing  mergers  simply 
acquire  suffixes  and  do  not  change  the  degree  of  overlap. 

Importantly,  we  find  that  at  L  =  4  there  are  no  new  mergers.  That  is,  any  length-4  word  that  leads  to  the  merg¬ 
ing  of  two  states  must  merge  before  the  fourth  symbol.  In  general,  the  length  at  which  the  last  merger  occurs  is 
equivalent  to  the  cryptic  order21.  Further,  it  is  known  that  the  von  Neumann  entropy  is  a  function  of  pairwise 
overlaps  of  signal  states27.  Therefore,  a  lack  of  new  mergers,  and  thus  constant  overlaps,  implies  that  the  von 
Neumann  entropy  is  constant.  This  demonstrates  that  Cq(L)  is  constant  for  L  >  /c,  for  k  the  cryptic  order. 

The  R-k  Golden  Mean  Process  was  selected  to  highlight  the  unique  role  of  the  cryptic  order,  by  drawing  a 
distinction  between  it  and  Markov  order.  The  result  emphasizes  the  physical  significance  of  the  cryptic  order. 
In  the  example,  it  is  not  until  L—4  that  a  naive  observer  can  synchronize  to  the  causal  state;  this  is  shown  by 
the  Markov  order.  For  example,  the  word  000  induces  two  states  D  and  E.  Just  one  more  symbol  synchronizes 
to  either  E  (on  0)  or  F  (on  1).  Yet  recall  that  synchronization  can  come  about  in  two  ways.  A  word  may  either 
induce  a  path  merger  or  a  path  termination.  All  merger-type  synchronizations  must  occur  no  later  than  the  last 
termination-type  synchronization.  This  is  equivalently  stated:  the  cryptic  order  is  never  greater  than  the  Markov 
order21. 

In  the  current  example,  we  observe  this  termination-type  of  synchronization  on  the  symbol  following  000. 
For  instance,  0000  does  not  lead  to  the  merger  of  paths  originating  in  multiple  states.  Rather,  it  eliminates  the 
possibility  that  the  original  state  might  have  been  B. 

It  is  the  final  merger- type  synchronization  at  L  =  3  that  leads  to  the  final  unique-prefix  quantum  merger  and, 
thus,  to  the  ultimate  minimization  of  the  von  Neumann  entropy.  So,  we  see  that  in  the  context  of  the  q-machine, 
the  most  efficient  state  compression  is  accomplished  at  the  process’  cryptic  order.  (One  could  certainly  continue 
beyond  the  cryptic  order,  but  at  best  this  increases  implementation  cost  with  no  functional  benefit.) 

Nemo  Process.  To  demonstrate  the  challenges  in  quantum  compressing  typical  memoryful  stochastic  processes, 
we  conclude  our  set  of  examples  with  the  seemingly  simple  three-state  Nemo  Process,  shown  in  Fig.  3(top). 
Despite  its  overt  simplicity,  both  Markov  and  cryptic  orders  are  infinite.  As  one  should  now  anticipate,  each 
increase  in  the  length  L  affords  a  smaller  and  smaller  state  entropy,  yielding  the  infinite  chain  of  inequalities: 

>  C  (1)  >  Cq{2)  >  Cq(  3)  >  ...  >  Cq(oo).  Figure  3  (right)  verifies  this.  This  sequence  approaches  the  asymp¬ 
totic  value  Cq( 00)  ~  1.0332.  We  also  notice  that  the  convergence  of  Cq(L )  is  richer  than  in  the  previous  processes. 
For  example,  while  the  sequence  monotonically  decreases  (and  at  eachp),  it  is  not  convex  in  L.  For  instance,  the 
fourth  quantum  incremental  improvement  is  greater  than  the  third. 

We  now  turn  to  discuss  the  broader  theory  that  underlies  the  preceding  analyses.  We  first  address  the  conver¬ 
gence  properties  of  Cq(L ),  then  the  importance  of  studying  the  full  range  of  memoryful  stochastic  processes,  and 
finally  tradeoffs  between  synchronization,  compression,  and  prediction. 

Cq(L)  Monotonicity.  It  is  important  to  point  out  that  while  we  observed  nonincreasing  Cq(L)  in  our  exam¬ 
ples,  this  does  not  constitute  proof.  The  latter  is  nontrivial  since  ref.  27  showed  that  each  pairwise  overlap  of  signal 
states  can  increase  while  also  increasing  von  Neumann  entropy.  (This  assumes  a  constant  distribution  over  signal 
states.)  Furthermore,  this  phenomenon  occurs  with  nonzero  measure.  They  also  provided  a  criterion  that  can 
exclude  this  somewhat  nonintuitive  behavior.  Specifically,  if  the  element-wise  ratio  matrix  R  of  two  Gram  matri¬ 
ces  of  signal  states  is  a  positive  operator,  then  strictly  increasing  overlaps  imply  a  decreasing  von  Neumann 
entropy.  We  note,  however,  that  there  exist  processes  with  e-machines  for  which  the  R  matrix  is  nonpositive.  At 
the  same  time,  we  have  found  no  example  of  an  increasing  Cq(L). 

So,  while  it  appears  that  a  new  criterion  is  required  to  settle  this  issue,  the  preponderance  of  numerical  evi¬ 
dence  suggests  that  Cq(L)  is  indeed  monotonically  decreasing.  In  particular,  we  verified  Cq(L)  monotonicity  for 
many  processes  drawn  from  the  topological  e-machine  library28.  Examining  1000  random  samples  of  two-symbol, 
N-state  processes  for  2  <N<7  yielded  no  counterexamples.  Thus,  failing  a  proof,  the  survey  suggests  that  this  is 
the  dominant  behavior. 

Infinite  Cryptic  Order  Dominates.  The  Biased  Coins  Process,  being  cryptic  order  k=  1,  is  atypical. 
Previous  exhaustive  surveys  demonstrated  the  ubiquity  of  infinite  Markov  and  cryptic  orders  within  process 
space.  For  example,  Fig.  4  shows  the  distribution  of  different  Markov  and  cryptic  orders  for  processes  generated 
by  six-state,  binary-alphabet,  exactly- synchronizing  e-machines29.  The  overwhelming  majority  have  infinite 
Markov  and  cryptic  orders.  Furthermore,  among  those  with  finite  cryptic  order,  orders  zero  and  one  are  not  com¬ 
mon.  Such  surveys  in  combination  with  the  apparent  monotonic  decrease  of  Cq(L)  confirm  that,  when  it  comes  to 
general  claims  about  compressibility  and  complexity,  it  is  advantageous  to  extend  analyses  to  long  sequence 
lengths. 
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Figure  3.  Nemo  Process:  (top)  Its  e-machine,  (left)  Statistical  complexity  C^,  quantum  state  entropy  Cq(L ),  and 
excess  entropy  E  as  a  function  of  As  self-loop  probability  p  E  [0, 1].  Cq(L)  is  calculated  and  plotted  (light  to  dark 
blue)  for  L  =  0, 1, ...,  19.  (right)  For  p  =  0.666,  Cq(L)  decreases  monotonically,  never  reaching  the  limit  since  the 
process’  cryptic  order  is  infinite.  The  full  quantum  advantage  is  realized  only  in  the  limit. 


Markov  order  R 

Figure  4.  Distribution  of  Markov  order  R  and  cryptic  order  kx  for  all  1, 132, 613  six-state,  binary- alphabet, 
exactly- synchronizing  e-machines.  Marker  size  is  proportional  to  the  number  of  e-machines  within  this  class 
at  the  same  ( R ,  kx).  (Reprinted  with  permission  from  ref.  29). 


Prediction-Compression  Trade  Off.  Let’s  return  to  Alice  and  Bob  in  their  attempt  to  synchronize  on  a 
given  stochastic  process  to  explore  somewhat  subtle  trade-offs  in  compressibility,  prediction,  and  complexity. 
Figure  5  illustrates  the  difference  in  their  ability  to  generate  probabilistic  predictions  about  the  future  given  the 
historical  data.  There,  Alice  is  in  causal  state  A  (signified  by  A  for  Alice).  Her  prediction  “cone”  is  depicted  in  light 
gray.  It  depicts  the  span  over  which  she  can  generate  probabilistic  predictions  conditioned  on  the  current  causal 
state  (A).  She  chooses  to  map  this  classical  causal  state  to  a  L  =  3  q-machine  state  and  send  it  to  Bob.  (Whether 
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Figure  5.  Trading  prediction  for  quantum  compression:  A  is  Alice’s  state  of  predictive  knowledge.  B  is  that 
for  Bob,  except  when  he  uses  the  process’  e-machine  to  refine  it.  In  which  case,  his  predictive  knowledge 
becomes  that  in  B\  which  can  occur  at  a  time  no  earlier  than  that  determined  by  the  cryptic  order  k. 


this  is  part  of  an  ensemble  of  other  such  states  or  not  affects  the  rate  of  qubit  transmission,  but  not  the  following 
argument.)  It  is  important  to  understand  that  Bob  cannot  actually  determine  the  corresponding  causal  state  (at 
time  t=  0).  He  can,  however,  make  a  measurement  that  results  in  some  symbol  sequence  of  length  3  followed  by 
a  definite  (classical)  causal  state.  In  the  figure,  he  generates  the  sequence  111  followed  by  causal  state  A  at  time 
t=  3.  This  is  shown  by  the  blue  state-path  ending  in  B  for  Bob.  Now  Bob  is  in  position  to  generate  corresponding 
conditional  predictions-/^  s  future  cone  Pr  (X0:oo  | B)  (dark  gray).  As  the  figure  shows,  this  cone  is  only  a  subpre¬ 
diction  of  Alices.  That  is,  it  is  equivalent  to  Alices  prediction  conditioned  on  her  observation  of  1 1 1  or  any  other 
word  leading  to  the  same  state. 

Now,  what  can  Bob  say  about  times  t—  0, 1,2?  The  light  blue  states  and  dashed  edges  in  the  figure  show  the 
alternate  paths  that  could  have  also  lead  to  his  measurement  of  the  sequence  111  and  state  A.  For  instance,  Bob 
can  only  say  that  Alice  might  have  been  in  causal  states  A,  D,  or  E  at  time  t=  0.  In  short,  the  quantum  representa¬ 
tion  led  to  Bob’s  uncertainty  about  the  initial  state  sequence  and,  in  particular,  Alice’s  prediction.  Altogether,  we 
see  that  the  quantum  representation  gains  compressibility  at  the  expense  of  Bob’s  predictive  power. 

What  if  Alice  does  not  bother  to  compute  k  and,  wanting  to  make  good  use  of  quantum  compressibility,  uses 
an  L—  1000  q-machine?  Does  this  necessarily  translate  into  Bob’s  uncertainty  in  the  first  1000  states  and,  there¬ 
fore,  only  a  highly  conditional  prediction?  In  our  example,  Alice  was  not  quite  so  enthusiastic  and  settled  for  the 
L—  3  q-machine.  We  see  that  Bob  can  use  his  current  state  A  at  t=  3  and  knowledge  of  the  word  that  led  to  it  to 
infer  that  the  state  at  t  =  2  must  have  been  A.  The  figure  denotes  his  knowledge  of  this  state  by  B'.  For  other  words 
he  may  be  able  to  trace  farther  back.  (For  instance,  000  can  be  traced  back  from  D  at  t—  3  all  the  way  to  A  at  1=  0.) 
The  situation  chosen  in  the  figure  illustrates  the  worst-case  scenario  for  this  process  where  he  is  able  to  trace  back 
and  discover  all  but  the  first  2  states.  The  worst-case  scenario  defines  the  cryptic  order  /c,  in  this  case  k=  2.  After 
this  tracing  back,  Bob  is  then  able  to  make  the  improved  statement,  “If  Alice  observes  symbols  11,  then  her  con¬ 
ditional  prediction  will  be  Pr  (X0:oo  |  A)”.  This  means  that  Alice  and  Bob  cannot  suffer  through  overcoding- using 
an  L  in  excess  of  k. 

Finally,  one  feature  that  is  unaffected  by  such  manipulations  is  the  ability  of  Alice  and  Bob  to  generate  a  single 
future  instance  drawn  from  the  distribution  Pr  (Xq.^  |  A).  This  helps  to  emphasize  that  generation  is  distinct  from 
prediction.  Note  that  this  is  true  for  the  q-machine  M(L)  at  any  length. 

Methods 

Let’s  explain  the  computation  of  Cq(L).  First,  note  that  the  size  of  the  q-machine  M(L)  Hilbert  space  grows  as  lJAl 
(I2|A|  for  the  density  operators).  That  is,  computing  Cq(L  =  20)  for  the  Nemo  Process  involves  finding  eigenvalues 
of  a  matrix  with  1012  elements.  Granted,  these  matrices  are  often  sparse,  but  the  number  of  components  in  each 
signal  state  still  grows  exponentially  with  the  topological  entropy  rate  of  the  process.  This  alone  would  drive  com¬ 
putations  for  even  moderately  complex  processes  (described  by  moderate-sized  e-machines)  beyond  the  access  of 
contemporary  computers. 

Recall  though  that  there  are,  at  any  L,  still  only  \S\  quantum  signal  states  to  consider.  Therefore,  the  embedding 
of  this  constant- sized  subspace  wastes  an  exponential  amount  of  the  embedding  space.  We  desire  a  computation 
of  Cq(L)  that  is  independent  of  the  diverging  embedding  dimension. 

Another  source  of  difficulty  is  the  exponentially  increasing  number  of  words  with  L.  However,  we  only  need 
to  consider  a  small  subset  of  these  words.  Once  a  merger  has  occurred  between  states  [77.)  and  \rj .)  on  word  w, 
subsequent  symbols,  while  maintaining  that  merger,  do  not  add  to  the  corresponding  overlap.  That  is,  the  contri¬ 
bution  to  the  overlap  (77^.  1 77 .)  by  all  words  with  prefix  w  is  complete. 

To  take  advantage  of  these  two  opportunities  for  reduction,  we  compute  Cq(L)  in  the  following  manner. 

First,  we  construct  the  “pairwise-merger  machine”  (PMM)  from  the  e-machine.  The  states  of  the  PMM  are 
unordered  pairs  of  causal  states.  A  pair-state  (cq,  <jj)  leads  to  (crm,  an)  on  symbol  x  if  a{  leads  to  am  on  x  and  <jj  leads 
to  crm  on  x.  (Pairs  are  unordered,  so  (crm,  an)  =  ( an ,  am).)  If  both  components  in  a  pair-state  lead  to  the  same  causal 
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Figure  6.  Pairwise-merger  machines  for  our  three  example  processes.  Pair- states  (red)  lead  to  each  other  or 
enter  the  e-machine  at  a  noncounifilar  state.  For  example,  in  the  R-k  Golden  Mean  (middle),  the  two  pair-states 
AF  and  FG  both  lead  to  pair-state  AG  on  0.  Then  pair-state  AG  leads  to  state  A,  the  only  noncounifilar  state  in 
this  e-machine. 


state,  then  this  represents  a  merger.  Of  course,  these  mergers  from  pair-states  occur  only  when  entering  non¬ 
counifilar  states  of  the  e-machine.  If  either  component  state  forbids  subsequent  emission  of  symbol  x,  then  that 
edge  is  omitted.  The  PMMs  for  the  three  example  processes  are  shown  in  Fig.  6. 

Now,  making  use  of  the  PMM,  we  begin  at  each  noncounifilar  state  and  proceed  backward  through  the 
pair-state  transient  structure.  At  each  horizon-length,  we  record  the  pair-states  visited  and  with  what  probabil¬ 
ities.  This  allows  computing  each  increment  to  each  overlap.  Importantly,  by  moving  up  the  transient  structure, 
we  avoid  keeping  track  of  any  further  novel  overlaps;  they  are  all  “behind  us”  Additionally,  the  finite  number  of 
pair-states  gives  us  a  finite  structure  through  which  to  move;  when  the  end  of  a  branch  is  reached,  its  contribu¬ 
tions  cease.  It  is  worth  noting  that  this  pair-state  transient  structure  may  contain  cycles  (as  it  does  for  the  Nemo 
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Process).  In  that  case,  the  algorithm  is  non-halting,  but  it  is  clear  that  contributions  generated  within  a  cycle 
decrease  exponentially 

All  of  this  serves  to  yield  the  set  of  overlaps  at  each  length.  We  then  use  a  Gram- Schmidt-like  procedure  to 
produce  a  set  of \S\  vectors  in  R^+  (the  positive  hyperoctant)  having  the  desired  set  of  overlaps. 

Weighting  these  real,  positive  vectors  with  the  stationary  distribution  yields  a  real,  positive-element  rep- 


resentation  of  the  density  operator  restricted  to  the  subspace  spanned  by  the  signal  states.  At  this  point,  comput¬ 
ing  Cq(L)  reduces  to  finding  eigenvalues  of  an  5  x  \S\  matrix. 

For  example,  consider  the  Nemo  Process.  The  sequence  of  overlap  increments  for 
L  =  [0,  1,  2,  3,  4,  5,  6,  7,  8,  ...],  for  (??2|??0)  respectively,  is  given  by: 

—  x  [0,  0,  0,  a0,  a0,  a0,  a\  a\  a1,  ...],  ^ 

x  [0,  0,  a° ,  a0,  a0,  a1,  a1  a1,  a 2,  ...],  and 

2 
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where  a  =  (1  —  p)/2. 

And,  the  asymptotic  cumulative  overlaps  are  given  by: 

II 

o 

3. 

(19) 

(Vi\v2)  =  and 

(20) 
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TT 

o 

II 

+  jb] 

(21) 

From  this,  we  computed  the  restricted  density  matrix  and,  hence,  its  L  — ►  oo  entropy  Cq  (oo)  —  1 . 0332,  as  illus¬ 
trated  in  Fig.  3.  The  density  matrix  and  eigenvalue  forms  are  long  and  not  particularly  illuminating,  and  so  we  do 
not  quote  them  here.  A  sequel  details  a  yet  more  efficient  analytic  technique  based  on  holomorphic  functions  of 
the  internal- state  Markov  chain  of  a  related  quantum  transient  structure30. 

Discussion 

Recalling  our  original  motivation,  we  return  to  the  concept  of  pattern;  in  particular,  its  representation  and  char¬ 
acterization.  We  showed  that,  to  stand  as  a  canonical  form,  a  process’  quantum  representation  should  encode, 
explicitly  in  its  states,  process  correlations  over  a  sufficiently  long  horizon-length.  In  demonstrating  this,  our 
examples  and  analyses  found  that  the  q-machine  generally  offers  a  more  efficient  quantum  representation  than 
the  alternative  previously  introduced1  and,  perhaps  more  importantly,  it  can  be  constructed  for  any  process  with 
a  finite-state  e-machine. 

Interestingly,  the  length  scale  at  which  our  constructions  compression  saturates  is  the  cryptic  order,  a  recently 
introduced  measure  of  causal-state  merging  and  synchronization  for  classical  stochastic  processes.  Cryptic  order, 
in  contrast  to  counifilarity,  makes  a  finer  division  of  process  space,  suggesting  that  it  is  a  more  appropriate  expla¬ 
nation  for  super-classical  compression.  We  also  developed  efficient  algorithms  to  compute  this  ultimate  quantum 
compressibility  Their  computational  efficiency  is  especially  important  for  large  or  infinite  cryptic  orders,  which 
are  known  to  dominate  process  space. 

We  cannot  yet  establish  the  minimality  of  our  construction  with  respect  to  all  alternatives.  For  example,  more 
general  quantum  hidden  Markov  models  (QHMMs)  may  yield  a  greater  advantage3.  Proving  minimality  among 
QHMMs  is  of  great  interest  on  its  own,  as  it  would  mark  a  canonical  quantum  representation  of  classical  stochas¬ 
tic  processes.  As  we  have  illustrated  in  Sec.  4,  the  observed  quantum  compression  has  come  at  a  cost-the 
q-machine  is  not  generally  fully  predictive  (while  the  e-machine  is).  There  exist  classical  representations  that 
make  a  similar  tradeoff-generative  models  can  be  (entropically)  smaller  than  the  e-machine,  but  can  only  gener¬ 
ate  instances  as  opposed  to  produce  full  predictive  future  morphs31.  Teasing  apart  the  effects  of  this  generative 
tradeoff  from  the  purely  quantum  contribution  to  compression  will  require  a  better  understanding  of  classical 
generative  models,  itself  a  nontrivial  task.  Further,  claims  about  overall  minimality  of  quantum  representation 
will  require  first  defining  the  appropriate  space.  We  look  forward  to  making  contributions  toward  answering  these 
questions  in  future  work. 

And,  what  is  the  meaning  of  the  remaining  gap  between  Cq(k)  and  E?  In  the  case  that  Cq(k)  is  in  fact  a  mini¬ 
mum,  this  difference  should  represent  a  quantum  analog  of  the  classical  crypticity.  Physically,  since  the  latter  is 
connected  with  information  thermodynamic  efficiency22,32,33,  it  would  then  control  the  efficiency  for  quantum 
thermodynamic  processes. 
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Let’s  close  by  outlining  future  impacts  of  these  results.  Most  generally,  they  provide  yet  another  motivation  to 
move  into  the  quantum  domain,  beyond  cracking  secure  codes34  and  efficient  database  queries35.  They  promise 
extremely  high,  super- classical  compression  of  our  data.  If  implementations  prove  out,  they  will  be  valuable  for 
improving  communication  technologies.  However,  they  will  also  impact  quantum  computing  itself,  especially 
for  Big  Data  applications,  as  markedly  less  information  will  have  to  be  moved  when  it  is  quantum  compressed. 
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